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Abstract 

From observational data alone, a causal DAG is in general only identifiable up to Markov 
equivalence. Interventional data generally improves identifiability; however, the gain of 
an intervention strongly depends on the intervention target, i.e., the intervened variables. 
We present active learning strategies calculating optimal interventions for two different 
learning goals. The first one is a greedy approach using single-vertex interventions that 
maximizes the number of edges that can be oriented after each intervention. The second 
one yields in polynomial time a minimum set of targets of arbitrary size that guarantees full 



identifiability. This second approach proves a conjecture of Eberhardt (2008 ) indicating 



the number of unbounded intervention targets which is sufficient and in the worst case 
necessary for full identifiability. We compare our two active learning approaches to random 
interventions in a simulation study. 



1 Introduction 

Causal relationships between random variables 
are usually modeled by directed acyclic graphs 
(DAGs), where an arrow between two random 
variables, X — >Y, reveals the former (X) as a di- 
rect cause of the latter (Y). From observational 
data alone (i.e. passively observed data from the 
undisturbed system), directed graphical mod- 
els are only identifiable up to Markov equiva- 
lence, and arrow directions (which are crucial 
for the causal interpretation) are in general not 
identifiable. Without the assumption of specific 
functional model classes and error distributions 
(Peters et al., 2011), the only way to improve 



identifiability is to use interventional data for 
estimation, i.e. data produced under a pertur- 
bation of the system in which one or several 
random variables are forced to specific values, 
irrespective of the original causal parents. 

The investigation of observational Markov 
equivalence classes has a long tradition 
the literature (IVerma and Pearl, 1990 



m 



Andersson et al., 1997| |Spirtes et al., 20000 . In 



interventional Markov equivalence classes for 
a given set of interventions (possibly affecting 
several variables simultaneously). In this 
paper, we present two active learning strategies 
for finding valuable interventions: one that 
greedily optimizes the number of orientable 
edges with single- vertex interventions, and one 
that minimizes the number of interventions 
at arbitrarily many vertices to attain full 
identifiability. 

Several approaches for actively learning 
causal models have been proposed during the 
last decade. Our method for finding interven- 
tion targets of unbounded size is closely related 



to the approach of Eberhardt (2008). In con- 



trast to his procedure, our algorithm has a poly- 
nomial time complexity; furthermore, we prove 
his conjecture on the number of intervention 
experiments sufficient and in the worst case 
necessary for fully identifying a causal model. 



a recent paper, Hauser and Buhlmann (2012 ) 



presented a graph-theoretic characterization of 



He and Geng (2008 ) presented a method to find 
a (nearly) minimal set of single- vertex interven- 
tions which guarantee the orientability of all 
undirected edges of an observational Markov 
equivalence class. In contrast to their ap- 
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proach, we proceed in a greedy way which 
results in a smaller or at most equal num- 
ber of single-vertex interventions to be per- 
formed. Tong and Koller (2001 ) finally pro- 



posed a Bayesian active learning strategy that 
minimizes an expected loss, in contrast to our 
frequentist methods. 

This paper is organized as follows: in Sect. [21 
we specify our notation of causal models and 
formalize our learning goals. In Sect. [31 we 
summarize graph-theoretic background mate- 
rial upon which our active learning algorithms, 
presented in Sect. HI are based. In Sect. El we 
evaluate our algorithms in a simulation study. 

2 Model 

2.1 Causal Calculus 

We consider a causal model on p random vari- 
ables (Xi,...,X p ) described by a DAG D. 
Formally, a causal model is a pair (D,f), 
where D is a DAG on the vertex set [p] := 
{l,...,p} which encodes the Markov prop- 
erty of the (observational) density /: f(x) = 
I\ P i=if( x i I x pa D (i)); Pai)(i) denotes the parent 
set of vertex i (see also Sect. [3]). Unless stated 
otherwise, all graphs in this paper are assumed 
to have the vertex set [p\. 

Beside the conditional independence rela- 
tions of the observational density implied by 
the Markov property, a causal model also 
makes statements about effects of interven- 
tions. We consider stochastic interventions 



ting or forcing one or several random variables 
Xi := (Aj)i g /, where / C [p] is called the in- 
tervention target, to the value of independent 
random variables {//. Extending the do() opera- 



tor (Pearl, 1995) to stochastic interventions, we 



denote the interventional density of X under 
such an intervention by 



f(x\do D (Xi = Ui)):-- 



' \\_f { x i\ x pa, D { 



where / is the density of Uj on Xj. By de- 
noting with 7 = and using the convention 
f(x\do(X^ = Uq)) = f(x), we also encompass 
the observational case as an intervention target. 



The interventional density f(x\dor>(Xi = Ui)) 
has the Markov property of the intervention 
graph the DAG that we get from D by 

removing all arrows pointing to vertices in I. 

We consider experiments based on datasets 
originating from multiple interventions. The 
family of targets Z C "P([p]), where V([p\) 
denotes the power set of [p], lists all (distinct) 
intervention targets used in an experiment. A 
family of targets X = {0, {3}, {1, 4}} e.g. char- 
acterizes an experiment in which observational 
data as well as data originating from an inter- 
vention at X3 and data originating from a (si- 
multaneous) intervention at X\ and X4 are mea- 
sured. In the whole paper, Z always stands for 
a family of targets with the property that for 
each vertex a € [p], there is some target I € X 
in which a is not intervened (a ^ I). This is 
e.g. the case when observational data is avail- 
able (0 6l). Two DAGs D\ and D2 are called 
Z-Markov equivalent (D\ ~x D2) if they are 
statistically indistinguishable under an experi- 
ment consisting of interventions at the targets 



in Z; we refer to Hauser and Biihlmann (2012 ) 



for a more formal treatment. 



Theorem 1 (Hauser and Biihlmann (2012)). 



Two DA Gs D\ and D2 are X -Markov equivalent 
if and only if 

(i) D\ and D2 have the same skeleton and the 
same v-structures (that is, induced subgraphs of 
the form a — > b < — c), and 



(ii) D± and D 2 2> have the same skeleton for 



(Korb et al., 2004) modeling the effect of set- allleX. 



An Z-Markov equivalence class of a DAG 
D is uniquely represented by its Z-essential 



graph £x(D) (Hau ser and Biihlmann, 2012). 
This partially directed graph has the same 
skeleton as D; a directed edge in £x(D) rep- 
resents Z-essential arrows, i.e. arrows that 
have the same orientation in all DAGs of the 
equivalence class; an undirected edge repre- 
sents arrows that have different orientations 
in different DAGs of the equivalence class. 
The concept of Z-essential graphs general- 
izes the one of CPDAGs which is well-known 



in the observational case QSpirtes et al., 2000 
Andersson et al., 1997[). We denote the Z 
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Markov equivalence class corresponding to an 
X-essential graph G by D(G). 

2.2 Active Learning 

Assume G is an X-essential graph estimated 
from interventional data produced under the 
different interventions in X. We consider two 
different greedy active learning approaches. In 
one step, the first one computes a single-vertex 
intervention that maximizes the number of ori- 
entable edges, while the second one computes an 
intervention target of arbitrary size that max- 
imally reduces the clique number, i.e. the size 
of the largest clique of undirected edges (see 
Sect. [3]). The motivation for the first approach 
is the attempt to quickly improve the identifi- 
ability of causal models with interventions at 
few variables; the motivation for the second ap- 



proach is the conjecture of Eberhardt (2008) 



(which we prove in Cor. [2]) stating that max- 
imally reducing the clique number after each 
intervention yields full identifiability of causal 
models with a minimal number of interventions. 

Formally, our two algorithms yield a solution 
to the following problems. The first one, called 
OptSingle, computes a vertex 

v = argmin max f (£zu{{v'}} ( D )) , (!) 
v > €]p ] DgD(G) 

where £,{H) denotes the number of undirected 
edges in a graph H. The second algorithm, 
called OptUnb, computes a set 

I = argmin max u (£ruW\(D)) , (2) 
j# cM -Dgd(g) w 

where uj(H) denotes the clique number of H 
(see also Sect. [3]). The key ingredients for the 
efficiency of OptSingle (Alg. [2]) and OptUnb 
(Alg. [3]) are implementations that minimize the 
objective functions of Eq. ([1]) and ([2]), resp., 
without enumerating all DAGs in the equiv- 
alence class represented by G. Graph theo- 
retic results upon which our implementations 
are based are summarized in the next section. 

3 Graph Theoretic Background 

A graph is a pair G = (V,E), where V is a 
set of vertices and E C (V x V) \ {(a,a)\a G 



V} is a set of edges. We always assume V = 
[p] := {1, 2, . . . ,p} and let the vertices of a graph 
represent the p random variables X\, . . . , X p . 

An edge (a, b) G E with (b, a) G E is called 
undirected and denoted by a — b, whereas an 
edge (a, b) G E with (b, a) £ E is called di- 
rected and denoted by a — > b. G is called di- 
rected if all its edges are directed (or undirected, 
resp.). A cycle of length k > 2 is a sequence of k 
distinct vertices of the form (do, ai, ■ ■ ■ , = cto) 
such that (flSi-i, ai) G E for i G {1, . . . , k}; the 
cycle is directed if at least one edge is directed. 

For a subset A C V of the vertices of G, the 
induced subgraph on A is G[A] := (A, E[A]), 
where E[A] := ED(AxA). A v-structure is an 
induced subgraph of G of the form a — > b < — c. 
The skeleton of a graph G is the undirected 
graph G u := (V,E U ),E U := EU{(a,b) \ (b,a) G 
E}. The parents of a vertex a are the ver- 
tices pa G (a) := {b G V \ b — >a G G}, its neigh- 
bors are the vertices nec(a) := {b G V \ a — b G 
G}; the degree of a is defined as deg(a) := 
\{b G V | (a, 6) G E V (b,a) G the maxi- 
mum degree of G is A(G) := max ae y deg(a). 

An undirected graph G = (V,E) is complete 
if all pairs of vertices are neighbors. A clique 
is a subset of vertices C C V such that G[C] 
is complete. The clique number oj{G) of G is 
the size of the largest clique in G. G is chordal 
if every cycle of length k > 4 contains a chord, 
i.e. two nonconsecutive adjacent vertices. 

A directed acyclic graph or DAG is a 
directed graph without cycles. A partially di- 
rected graph G = (V, E) is a chain graph if 
it contains no directed cycle; undirected graphs 
and DAGs are special cases of chain graphs. Let 
G' be the undirected graph we get by removing 
all directed edges from a chain graph G. The 
chain component Tc{a) of a vertex a is the 
set of all vertices that are connected to a in G'. 
The set of all chain components of G is denoted 
by T(G); they form a partition of V. We ex- 
tend the clique number to chain graphs G by 
the definition U)(G) := maxy e x(G) W (^P1)- 

An ordering of a graph G, i.e. a permu- 
tation a : \p] — > [p], induces a total order 
on V by the definition a < a b :44> c _1 (a) < 
(j~ l {b). An ordering a = (vi, . . . ,v p ) is a per- 
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feet elimination ordering or PEO if for all i, 

nec« n{«i, . . . , Vi-i} is a clique in G u . A topo- 
logical ordering or TO of a DAG D is an 

ordering a such that a < a b for each arrow 
a — > b € D; we then say that D is oriented 
according to a. 

For the rest of this section, let G = (V, E) 
be an undirected graph. We consider a variant 
of the breadth-first search (BFS) called lexi- 
cographic BFS or LexBFS QRose, 1970] ) that 
takes an ordering {v\, . . . , v p ) of V and the edge 
set E as input and that outputs an ordering 
cr = LexBFS((di, . . . ,v p ),E) listing the ver- 
tices of V in the visited order. If {v±, ... ,Vk} 
is a clique, a also starts with vi,...,Vk] we 



refer to Hauser and Biihlmann (2012 ) for de- 



tails of such an implementation. For a set 
A = {a>i, . . . , afc} C V and an additional ver- 
tex v € V \ A, e.g., we use the notation 
LexBFS((A,v,...),E) to denote a LexBFS- 
ordering produced from a start order of the form 
(a%, . . . , ctfc, v, . . .), without specifying the order- 
ings of A and V\(AU {v}). 

Proposition 1 (|Rose et al. (1976[». Let G = 



(V, -E 1 ) be an undirected chordal graph with a 
LexBFS -ordering a. Then a is also a PEO 
on G. By orienting the edges of G according to 
a , we get a DAG without v-structures. 

Alg.[3]is strongly based on graph colorings. A 
fc-coloring of G is a map c : V — > [k]\ the col- 
oring c is proper if c(u) ^ c(v) for every edge 
u — v S G. We say that G is /c-colorable if 
it admits a proper fc-coloring; the chromatic 
number x(G) of G is the smallest integer k 
such that G is /c-colorable. By greedily color- 
ing the vertices of the graph (see Alg. [[]), one 
gets a proper fc-coloring with k < A(G) + 1 in 



polynomial time (Chvatal, 1984). 

For any undirected graph G, the bounds 
u){G) < X {G) < A(G) + 1 hold. G is perfect 
if u){H) = x(H) holds for every induced sub- 
graph H of G. An ordering a of G is perfect 
if for any induced subgraph H of G, greedy col- 
oring along the ordering induced by a yields an 
optimal coloring of H (i.e., a x(f^)-coloring). 



Input : G = ([p], E): undirected graph; 

a = (vi, . . . ,v p ): ordering of G. 
Output: A proper coloring c : [p] — > N 
c(«i) <- 1; 
for i — 2 to p do 

c(vi) <— min{fc € N [ k ^ c{u) Vu£ 
{«!,..., Vi-i} n ne(vi)}; 

return c; 



Algorithm 1: GreedyColoring(G, a). 
Greedy algorithm that yields a proper 
coloring of G along an ordering a. 

if G has no induced subgraph of the form a — 
b — c — d with a < a b and d < a c. 

It can easily be seen that a PEO fulfills the 
requirement of Prop. [2j hence we get, together 
with Prop. [H the following corollary. 

Corollary 1. (i) A perfect elimination order- 
ing on some graph G is perfect, 
(ii) Any chordal graph has a perfect ordering. 



Proposition 3 (Chvatal (1984)). A graph with 



a perfect ordering is perfect. 

4 Optimal Intervention Targets 

An X-essential graph is a chain graph with 
chordal chain components. Their edges are ori- 
ented according to a PEO in the DAGs of the 
corresponding equivalence class; edge orienta- 
tions of different chain components do not in- 
fluence (i.e., additionally restrict) each other 



Proposition 2 (Chvatal (1984)). An ordering 



a of an undirected graph G is perfect if and only 



(Hauser and Biihlmann (2012 ), Thm. 18 and 
Prop. 16). We can thus restrict our search 
for optimal intervention targets to single chain 
components. We can even treat each chain com- 
ponent as an observational essential graph, as 
the following lemma shows; we skip a formal 
proof which is rather simple, but technical. 

Lemma 1. Consider an I -essential graph 
£ X (D) of some DAG D, and let T G T(£ X (X>)). 
Furthermore, let I C [p], I ^ I, be an (addi- 
tional) intervention target. Then we have 

£xu{i}(D)[T]=£ WnTy (D[T]) . 

4.1 Single- vertex Interventions 

We start with the treatment of the first active 
learning approach mentioned in Sect. 12.21 By 
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virtue of the following lemma, the maximum in 
Eq. ([I]) can be calculated without enumerating 
all representative DAGs. The lemma easily fol- 
lows from Thm. [TJ we skip the proof. 

Lemma 2. Let G be an I-essential graph, and 
let v G [p\. Assume D\ and D2 G D(G) 
such that {a G nec(v) | a — > v G D\} = 
{a G nec(w) \ a — > v £ D2} = C . Then we 
have D\ ~j/ D2 under the family of targets 
X' = XU{{v}}. 

The next proposition states that every clique 
in nec(v) is an admissible set G in the sense 
of Lem. [2] and vice versa. Algorithmically, a 
DAG D as described in Prop. H] can be con- 
structed with LexBFS; this motivates Alg. [2] 
which yields a solution of Eq. ([1]). 

Proposition 4 (Andersson et al. (1997)). Let 



G be an undirected chordal graph, a G \p] and 
C C nee (a). There is a DAG D C G with 
D u = G and {b G ne(a) | b^a G D} = C which 
is oriented according to a PEO if and only if C 
is a clique. 

4.2 Interventions at Targets of 
Arbitrary Size 

We now proceed to the solution of Eq. ([2]) . The 
following proposition, which was already con- 



jectured by Eberhardt (2008), shows that the 
minimum in Eq. (0) only depends on the clique 



Input : G — ([p],E): I-essential graph. 
Output: An optimal intervention vertex v £ [p], or 

if G only has directed edges. 
^ op t 0; J7o P t <- 0; 
for v — 1 to p do 

Vrnin «~ P 2 ; 

foreach clique C C nec(v) do 

a <- LexBFS((C, v, . . .)), E[T a (v)\); 
D <— DAG with skeleton G, topological 
ordering a; 
G £{0,{„}}(D); 
?7 <s— number of arrows in G'; 



if 77 < ?7min then 

> r/opt then 

(Wo P t,??opt) <~ («opt 

if v opt 7^ then return w op t; 
else return 0; 



number of G: 



mm max 00 
I'C[p] DeD(G) 



{£xu{P}(D)) = \oo(G)/2]. 



Algorithm 2: OptSingle(G): yields a solu- 
tion of Eq. ([I]). 



Proposition 5. Let G be an undirected, con- 
nected, chordal graph on the vertex set V = \p\; 
such a graph is an observational essential graph. 

(i) There is an intervention target L C V such 
that for every DAG D G D(G), we have 

uj(£ W} (D)) < MG9/2] . 

(ii) For every intervention target L C [p] there 
is a DAG D G D(G) such that 

> MG9/2] . 

Proof, (i) Since G is chordal, we have x{G) = 
oj(G) by Cor. QTn) and Prop. H Let c : V -> 
[w(G)] be a proper coloring of G. Define / : = 
c _1 ([/i]) for /i := [w(G)/2]. With an interven- 
tion at the target /, at most the edges of G[I] 
and G[V \ L] are unorientable for any causal 
structure D G D(G) under the family of tar- 
gets X := {0,/}. Therefore the bound 

u(£z(D)) < max{uj{G[L}),uj(G[V \ /])} 

holds for every D G D(G). It remains to show 
that both of these terms are bounded by h. 

The induced subgraph G[L] is also perfect, 
and c\i is a proper /i-coloring of G[L]. Hence 
we have uj(G[L]) = x(G[L\) < h. Analogously, 
c\y\j is a proper (w(G) — /i)-coloring of G[V\L], 
hence we have u(G\V \ I]) = x(G[V \ /]) < 
uj(G) — h < h by definition of h. 
(ii) Let G be a maximum clique in G, and 
define C PI I =: . . . , u&} and C \ I =: 
{vk+i, . . . , The LExBFS-ordering <r := 

LexBFS((ui, . . . , . . .), E) starts with the 
vertices Vi,...,v u . Set X := {0,/} and let 
D G D(G) be oriented according to a. 

We claim that the arrows in D[C n J] 
and in D[C \ L] are not X-essential in D. For 
Vi, Vj G Gn/ (i < j), consider the ordering a' : = 
LexBFS((i;i, . . . ,vj, . . . ,Vi, . . . ,v ul , . . -),E), 
and D' G D(G) which is obtained by orienting 
the edges of G according to a' . We then have 
D' ~x D: 
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Input : G — ([p],E): essential graph. 
Output: An optimal intervention target I C [p] 
/ <- 0; 

foreach T e T(G) do 

a <- LExBFS(T,£;[r]); 
c <— GreedyColoring(G[T], a); 
lo <— max„ 6 [p] c(v); h <— [w/2]; 

return /; 



Algorithm 3: OptUnb(G): yields a solution 
of Eq. ([2]); time complexity is 0(p + 

• D and D' obviously have the same skeleton, 
and both have no v-structures. 

• an d £>'(') have the same skeleton be- 
cause all arrows between a vertex a € I and 
another one 6^7 point away from a. 

For Vi,Vj € C \ I, the argument is analogous, 
which proves the claim. 

£x(D) contains the cliques C D I and C \ I 
of size and — k though. The fact that 

m&x{k,uj(G) — k} > \oj(G)/2] completes the 
proof. □ 

The constructive proof shows that a mini- 
mizer / of Eq. ([2]) can be generated by means of 
an optimal coloring which we can get by greedy 
coloring along a LExBFS-ordering (see Prop. [1] 
and Cor. [[]); this justifies Alg. [3J 

Since an X-essential graph has only one rep- 
resentative DAG if and only if its clique num- 
ber is 1, a direct consequence of Prop. [5] is a 
(sharp) upper bound on the number of interven- 
tions necessary to fully identify a causal model, 



as it was conjectured by Eberhardt (2008). 

Corollary 2. LetG be the an X- essential graph. 
There is a set of k = [log 2 (w(G))] intervention 
targets 1% , . . . , Jfe which are sufficient and in the 
worst case necessary to make the causal struc- 
ture fully identifiable: 



£7 



,(D) = D V D£ D(G). 



-iu{h,...,i k }\ 

The intervention targets I±, . . . , 1^ of Cor. [2] 
can be constructed by iteratively running Alg. [3J 

on G = £x(D), £xu{h}( D ), £xu{h,i 2 }( D ) etc. 
However, they could also be constructed at once 
by a modification of Alg.[3j Let c : \p] — > [u>(G)] 
be a function such that for each chain com- 
ponent T £ T(G), c\t is a proper coloring of 



G[T]. By defining Ij as the set of all vertices 
whose color has a 1 in the j' th position of its bi- 
nary representation, we make sure that for ev- 
ery pair of neighboring vertices u and v, there 
is at least one j such that \{u, v} PI Ij\ = 1; 
hence the edge between u and v is orientable 
in £zu{h....,i k }(D)- Since the binary representa- 
tion of 00(G), the largest color in c, has length 
k = |~log 2 (u;(G))] , this procedure creates a set 
of k intervention targets that fulfill the require- 
ments of Cor. [2j 

The problem of finding intervention targets 
to fully identify a causal model is related 
to the problem of finding separating systems 
of the chain components of essential graphs 
(Eberhardt, 2007). A separating system of 
an undirected graph G = (V, E) is a subset 
T of the powerset of V such that for each 
edge a — b G G, there is a set F € T with 



F n {a,b}\ = 1. Cai (1984) has shown that 



the minimum separating system of a graph G 
has cardinality [log 2 (x(G))]; this also proves 



Cor. [21 The proof of Cai (1984) uses arguments 



similar to ours given in the paragraph above for 
the non-iterative determination of the targets 
h,...,I k of Cor. [2J 

4.3 Discussion 

LexBFS and GreedyColoring have a time 
complexity of 0(p + \E\) when executed on a 
graph G = ([p],E). Thus, OptUnb (Alg. [3j 
also has a linear complexity^ The time com- 
plexity of OptSingle (Alg. [2]) on the other 
hand depends on the size of the largest clique 
in the X-essential graph G. By restrict- 
ing OptSingle to I-essential graphs with a 
bounded degree, its complexity is polynomial in 
p; otherwise, it is in the worst case exponential. 

We emphasize that our two active learning 
algorithms do not optimize the same objective; 
OptUnb does not guarantee maximal identifi- 
ability after each intervention, and OptSingle 
does not guarantee a minimal number of single- 
vertex interventions to full identifiability. Con- 
sider e.g. the (observational) essential graph in 
Fig. []Ja). It is not hard to see that all its rep- 



In contrast, finding a minimum separating set on 
non-chordal graphs is NP-complete ( |Cai, 1 984). 
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Figure 1: Observational essential graph (a) and 
a representative (b). 

resentatives are fully identifiable after at most 
two single- vertex interventions: the first inter- 
vention should be performed at vertex 3, the 
second one either at vertex 2 or 6. If the DAG 
in Fig. H{b) represents the true causal model, 
however, OptSingle will need three steps to 
full identifiability; it will iteratively propose in- 
terventions at targets 6, 3 and 2. 

In general, OptUnb does not yield an in- 
tervention target of minimal size. With two 
straightforward improvements, we could reduce 
the number of intervened vertices: first, we 
could take h <— L w /2j instead of h [w/2] 
in Alg. [3l the proof of Prop. [5] is also valid with 
this choice. Second, we could permute the col- 
ors produced by the greedy coloring such that 
|c 1 ({1>)| < |c -1 ({2})| < .... However, since 
an optimal coloring of a graph is not unique, not 
even up to permutation of colors, these heuristic 
improvements would still not guarantee a min- 
imal intervention target with the properties re- 
quired in Prop. [5j 

5 Experimental evaluation 

We evaluated Alg. [2] and in a simulation study 
on 4000 randomly generated causal models with 
vertex numbers p G {10, 20, 30, 40}. 

5.1 Methods 

We compared four active learning approaches: 
our two algorithms OptSingle and OptUnb, 
a purely random proposition of single- vertex in- 
terventions (denoted by Rand), and a slightly 
advanced random approach that randomly 
chooses any vertex which has at least one inci- 
dent undirected edge (denoted by RandAdv). 
To measure the quality of the proposed inter- 
ventions, we evaluated the algorithms together 
with an "oracle estimator" , that is, an algorithm 



that yields the true X-essential graph of some 
DAG. This corresponds to model estimation in 
the limit of infinite sample sizes. 

For each vertex number p = {10,20,30,40}, 
we randomly generated 1000 DAGs with a bi- 
nomial distribution of vertex degrees, having an 
expected degree of 3. Starting from the obser- 
vational essential graph, we iteratively included 
the intervention targets proposed by the active 
learning algorithms. We defined the "survival 
time" of a DAG as the number of active learning 
steps needed for full identifiability, measured in 
intervention targets (T) or intervened variables 
(V). If a DAG was fully identifiable e.g. un- 
der the family Z = {0, {1,4}, {3}}, we counted 
T = 2 (non-empty) targets and V = 3 vari- 
ables. For each vertex number and algorithm, 
we estimated the "survival function", i.e. the 
probability S T (t) := P[T > t] or S v (v) := 
P[V > v], resp., with a Kaplan- Meier estima- 
tor ( |Kaplan and Meier, 1958[ ). 



5.2 Results 

Fig. [2] shows the estimated survival functions 
of the active learning algorithms. Rand 
was clearly beaten by all competitors, and 
OptUnb dominated all other strategies in 
terms of intervention targets. However, if 
we measure the number of intervened ver- 
tices, OptUnb was even slightly worse than 
RandAdv. OptSingle gave a significant im- 
provement over RandAdv; however, the step 
from Rand to RandAdv is much larger than 
from Rand to OptSingle. 

When essential graphs are not given by an 
oracle, but estimated from finite samples with 
e.g. Greedy Interventional Equivalence Search 



(Hauser and Buhlmann, 2012), the convergence 
to the true model is slower due to estimation 
errors. Performance differences e.g. between 
RandAdv and OptSingle vanish for small 
sample sizes (data not shown). 

6 Conclusion 

We developed two algorithms which propose op- 
timal intervention targets: one that finds the 
single-vertex intervention which maximally in- 
creases the number of orientable edges (called 
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Figure 2: Number of intervention steps needed for full identifiability of DAGs, measured in targets 
(T) or intervened variables (V); for algorithms proposing only single- vertex interventions, both 
numbers are the same. Thin lines: Kaplan-Meier estimates; colored bands: 95% confidence region. 



OptSingle), and one that maximally reduces 
the clique number of the non-orientable edges 
with an intervention at arbitrarily many vari- 
ables (called OptUnb). We proved a conjecture 



of Eberhardt (2008) concerning the number of 



interventions sufficient and in the worst case 
necessary for fully identifying a causal model 
by showing that the OptUnb yields, when ap- 
plied iteratively, a minimum set of intervention 
targets that guarantee full identifiability. 

In a simulation study, we showed that both 
algorithms lead significantly faster to full iden- 
tifiability than randomly chosen interventions. 
If we count the total number of intervened 
variables, however, OptUnb performed slightly 
worse than a random approach. This illus- 
trates the fact that sequentially intervening sin- 
gle variables yields in general more identifiabil- 
ity that intervening those variables simultane- 
ously. 
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